{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_id": "4861CE9389C147C29DEEDB19BE6E22BD",
    "id": "BACA2D8578234029A18E3F3FD0DE035D",
    "jupyter": {},
    "notebookId": "652d2b6a8e5eda4b7f08a7ea",
    "runtime": {
     "execution_status": null,
     "is_visible": false,
     "status": "default"
    },
    "scrolled": false,
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# 练习：贝叶斯推断\n",
    "\n",
    "### 🔈任务概述\n",
    "\n",
    "在本次作业中，我们将进一步探索贝叶斯模型在心理学实验中的应用，探究自我加工的优势通畅使用匹配范式任务中“自我（self）”和“他人（other）”的认知匹配之间的关系 (Sui et al., 2012)，通过分析不同实验条件下的数据，探讨 \"Label\"（self 和 other）对反应时间的影响。\n",
    "\n",
    "![Image Name](https://cdn.kesci.com/upload/smipfxtgj4.png?imageView2/0/w/480/h/640)  \n",
    "\n",
    "> Sui, J., He, X., & Humphreys, G. W. (2012). Perceptual effects of social salience: Evidence from self-prioritization effects on perceptual matching. Journal of Experimental Psychology: Human Perception and Performance, 38(5), 1105–1117. https://doi.org/10.1037/a0029792  \n",
    "\n",
    "\n",
    "在lecture7 ～ lecture9 中，我们已经基于 Kolvoort et al. (2020) 的数据，针对两个示例被试（编号为 \"201\" 和 \"205\"）进行了详细的演示，介绍了贝叶斯推断的基本步骤与方法。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "现在，请你根据不同被试的信息，独立完成贝叶斯推断的基本流程。\n",
    "\n",
    "![Image Name](https://cdn.kesci.com/upload/sms4lt6fnj.png?imageView2/0/w/720/h/960)\n",
    "\n",
    "### 📚 研究问题\n",
    "标签（self 和 other）是否对反应时间有显著影响？具体来说，\"self\" 条件下的反应时间是否显著短于 \"other\" 条件下的反应时间？\n",
    "\n",
    "### 🌟 目标解锁🚀📊\n",
    "1. 数据读取和预处理\n",
    "2. 建立 PyMC 贝叶斯模型\n",
    "3. MCMC 采样与诊断\n",
    "4. HDI 检验：计算最高密度区间 (HDI) 和实际等效区间 (ROPE) \n",
    "5. 贝叶斯因子计算\n",
    "\n",
    "### 🔄 任务练习指引\n",
    "在以下练习中，按照步骤逐步完成贝叶斯推断流程。你可以参考 lecture7 ～ lecture9 课件中的内容，每个步骤中均提供了任务指引。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入必要的库\n",
    "import pymc as pm\n",
    "import arviz as az\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "# 设置随机种子以确保结果的可重复性\n",
    "np.random.seed(123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. 数据读取和预处理\n",
    "\n",
    "在这一练习中，您需要使用指定编号的被试数据，并筛选出关键变量。\n",
    "\n",
    "具体步骤：\n",
    "1. 筛选出目标编号的被试，匹配类型为\"Matching\"的数据\n",
    "2. 选择需要的两列数据\"Label\", \"RT_sec\"（“Label”编码规则：self=0，other=1）\n",
    "3. 设置索引\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 pd.read_csv 加载数据 Kolvoort_2020_HBM_Exp1_Clean.csv\n",
    "try:\n",
    "  df_raw = pd.read_csv('/home/mw/input/bayes3797/Kolvoort_2020_HBM_Exp1_Clean.csv')\n",
    "except:\n",
    "  df_raw = pd.read_csv('data/Kolvoort_2020_HBM_Exp1_Clean.csv')\n",
    "\n",
    "df_raw.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 筛选出目标编号的被试，匹配类型为\"Matching\"的数据\n",
    "\n",
    "\n",
    "# 选择需要的两列\"Label\", \"RT_sec\"\n",
    "df = ...\n",
    "\n",
    "# 重新编码标签（Label）\n",
    "\n",
    "\n",
    "# #设置索引\n",
    "\n",
    "\n",
    "# 显示部分数据\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 通过PyMC建立贝叶斯模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基于自我匹配范式的简单线性回归模型：\n",
    "\n",
    "$$\n",
    "Y_i \\mid \\beta_0, \\beta_1, \\sigma \\sim \\mathcal{N}(\\mu_i, \\sigma^2)\n",
    "$$\n",
    "\n",
    "**先验（prior）**  \n",
    "\n",
    "* > $\\beta_{0}   \\sim N\\left(5, 2^2 \\right)$  \n",
    "  * > 模型的截距项服从均值为 5，标准差为 2 的正态分布。  \n",
    "* > $\\beta_1   \\sim N\\left(0, 1^2 \\right)$  \n",
    "  * > 模型的斜率项，服从均值为 0，标准差为 1 的正态分布。  \n",
    "* > $\\sigma   \\sim \\text{Exp}(0.3)$  \n",
    "  * > 代表误差项的标准差，服从参数为 0.3 的指数分布。  \n",
    "\n",
    "**似然（likelihood）**  \n",
    "\n",
    "* > $\\mu_i = \\beta_0 + \\beta_1X_i$  \n",
    "* > $Y_i {\\sim} N\\left(\\mu_i, \\sigma^2\\right)$  \n",
    "\n",
    "\n",
    "<div style=\"padding-bottom: 30px;\"></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 设置随机种子以确保结果可复现\n",
    "np.random.seed(123)\n",
    "\n",
    "# 通过 pm.Model() 创建模型\n",
    "with pm.Model() as linear_model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with linear_model:\n",
    "    trace = pm.sample()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型诊断"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import arviz as az\n",
    "\n",
    "# 补全...\n",
    "ax = az.plot_trace(...)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "az.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HDI + ROPE 检验\n",
    "\n",
    "假设 参数 $\\beta_1$ 的值在 $[-0.05, 0.05]$ 范围内可以视为实用等效区间，使用`az.plot_posterior`绘制后验分布，显示 HDI 和 ROPE。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idata = trace.posterior[...].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idata = az.from_dict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# HDI + ROPE 检验\n",
    "# 定义实际等效区间 (ROPE)\n",
    "rope_interval = [...,...]  \n",
    "\n",
    "az.plot_posterior()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 贝叶斯因子检验\n",
    "### Testing against the point-null\n",
    "\n",
    "\n",
    "我们可以首先从最简单的假设开始，即假设没有效应，self和other条件下的反应时间相同，即**探究 $\\beta_1$ 是否等于零**🤔\n",
    "\n",
    "\n",
    "- 零假设 $H_0: \\beta_1 = 0$ \n",
    "- 备择假设 $H_1: \\beta_1 \\neq 0$\n",
    "\n",
    "通过`az.plot_bf`函数进行Testing against the point-null，对先验和后验分布在零假设点处的概率密度值进行可视化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idata = az.from_dict()\n",
    "az.plot_bf()\n",
    "sns.despine()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 作业提交tips\n",
    "\n",
    "1. 本次作业将进行人工检查，请确保提交的代码能够顺利运行且无错误。运行失败的代码将影响评分结果，请在提交前仔细测试。\n",
    "2. 在截止日期之前，可以多次提交修改后的作业，我们会以最终版本为准。\n",
    "3. 为了提高可读性，请确保代码简洁、清晰，并添加必要的注释。\n",
    "4. 所有作业将在截止时间后统一检查，请务必在截止时间前完成最终提交。过期提交将不予接受，请合理安排时间。\n",
    "5. 如有问题，请提前咨询助教或参考课件内容。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
